1 Introduction

Here we present a summary of processing steps on WASH dataset.

  • The aim is to investigate the contribution of demographic, social or economic factors to improved water sanitation and hygien (WASH) among the urban poor.



1.1 Activities

  • Three WASH variables were created as per WHO definition (Damazo). See codebook for variable labels.

    • cat_watersource

    • cat_toilettype

    • cat_garbagedisposal

  • For every variable, cases with NIU or Missing: Impute were recoded to NA.

  • Cases which had NA in Gender and Age were completely dropped.

  • Combined smaller groups to others.

  • A composite variable was created from the three was variables using logistic PCA.

    • PCs scores were used to create categories (quantiles).
  • Centered the Household total expenditure.

  • For every case, we summed the number of WASH indicators the had access to (max = 3) and calculated the proportion (No sure how to call this rate) Is it possible to model the total as poisson process?

  • Visualization plots for individual WASH were created but initial modelling is on composite WASH variable.

  • We also present the result from Generalized Linear Mixed-effect Model using lme4 package (glmer).



1.2 Codebook



2 Data Exploration

2.1 Missingness

The table below summarizes the proportion of missingness for all the variables.

  • A total of 0 variables which had \(100\%\) missingness were dropped.



2.2 Descriptives

We begin by showing the distribution of individual WASH variables (indicators) over time and space (slum area). Thereafter, we show the distribution of demographic, social and economic variables, of interest, based on composite WASH variable.

2.2.1 Water source



2.2.2 Toilet type



2.2.3 Garbage disposal



2.2.4 Composite WASH indicator variable



2.2.4.1 Composite WASH indicator and Gender



2.2.4.2 Composite WASH indicator and Age



2.2.4.3 Composite WASH indicator and Ethnicity



2.2.4.4 Composite WASH indicator and Hunger scale



2.2.4.5 Composite WASH indicator and Poverty line



2.2.4.6 Composite WASH indicator and Wealth quintile



2.2.4.7 Composite WASH indicator and Total household expenditure



2.2.5 Proportion of WASH indicators the respondent had access to.



3 Data Analysis

3.1 Generalized Linear Mixed Models

3.1.1 Model 1

The specification for this model was as follows:

\[wash = demographs + social + economics + slum + year + \mathbf{(1 + year|hh\_id)}\]

Random Effect:

  • hhid_anon:

      (Intercept) intvwyear
    (Intercept) 2.703 -0.2476
    intvwyear -0.2476 0.02657

Fixed effects:

3.1.2 Model 2

Mike and Morgan suggest another way to kind of to multi-variate GLMM by reshaping the dataset to long format (along WASH variables) and then treating the new ‘WASH indicator’ as one of the fixed efffects.


\[wash = (demographs + social + economics + slum + year)*wash\_indicator + \mathbf{(1 + year|hh\_id:wash\_indicator)}\]